Using the Output Embedding to Improve Language Models
نویسندگان
چکیده
We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.
منابع مشابه
Slim Embedding Layers for Recurrent Neural Language Models
Recurrent neural language models are the state-of-the-art models for language modeling. When the vocabulary size is large, the space taken to store the model parameters becomes the bottleneck for the use of recurrent neural language models. In this paper, we introduce a simple space compression method that randomly shares the structured parameters at both the input and output embedding layers o...
متن کاملThe Optimal Steering Control System using Imperialist Competitive Algorithm on Vehicles with Steer-by-Wire System
Steer-by-wire is the electrical steering systems on vehicles that are expected with the development of an optimal control system can improve the dynamic performance of the vehicle. This paper aims to optimize the control systems, namely Fuzzy Logic Control (FLC) and the Proportional, Integral and Derivative (PID) control on the vehicle steering system using Imperialist Competitive Algorithm (IC...
متن کاملLink Prediction using Network Embedding based on Global Similarity
Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...
متن کاملPhishing website detection using weighted feature line embedding
The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. M...
متن کاملEfficient DMUs improvement based on input expenses reduction using data envelopment analysis
Network nowadays, the main purpose in the models designed by Data Envelopment Analysis (DEA), is to improve the outputs. In this method which is expressed by Khodabakhshi, with an output oriented BCC model, the output increases when the input increases. In this article we will discuss the efficient Decision Making Units (DMUs) in the input oriented BCC model to reduce the input expenses signifi...
متن کامل